Published online 27 October 2011 



Nucleic Acids Research, 2012, Vol. 40, No. 4 1509-1522 

doi:10.1093/nar/gkr869 



The MOF-containing NSL complex associates 
globally with housekeeping genes, but 
activates only a defined subset 

Christian Feller 1 , Matthias Prestel 1 , Holger Hartmann 2 , Tobias Straub 1 , 
Johannes Soding 2 and Peter B. Becker 1 '* 

1 Adolf-Butenandt-lnstitute and Center for Integrated Protein Science of the Ludwig-Maximilians-University, 
SchillerstraBe 44, 80336 Munchen, Germany and 2 Gene Center Munich, Department of Chemistry and 
Biochemistry, Ludwig-Maximilians-University, Feodor-Lynen-StraBe 25, 81377 Munich, Germany 

Received September 5, 2011; Revised September 27, 2011; Accepted September 28, 2011 



ABSTRACT 

The MOF (males absent on the first)-containing NSL 
(non-specific lethal) complex binds to a subset of 
active promoters in Drosophila melanogaster and 
is thought to contribute to proper gene expression. 
The determinants that target NSL to specific pro- 
moters and the circumstances in which the complex 
engages in regulating transcription are currently 
unknown. Here, we show that the NSL complex pri- 
marily targets active promoters and in particular 
housekeeping genes, at which it colocalizes with 
the chromatin remodeler NURF (nucleosome re- 
modeling factor) and the histone methyltransferase 
Trithorax. However, only a subset of housekeeping 
genes associated with NSL are actually activated by 
it. Our analyses reveal that these NSL-activated pro- 
moters are depleted of certain insulator binding pro- 
teins and are enriched for the core promoter motif 
'Ohler 5'. Based on these results, it is possible to 
predict whether the NSL complex is likely to regulate 
a particular promoter. We conclude that the regula- 
tory capacity of the NSL complex is highly context- 
dependent. Activation by the NSL complex requires 
a particular promoter architecture defined by com- 
binations of chromatin regulators and core promoter 
motifs. 

INTRODUCTION 

Eukaryotic organisms consist of a diversified set of highly 
specialized cells. Their individual identities are determined 
by the appropriate expression of cell-specific genes while a 
battery of genes that are expressed in all cells maintain 
general ('housekeeping') functions. Gene expression at 



the transcriptional level is governed by an intricate inter- 
play between transcription regulators and local chromatin 
organization. In general, the packaging of genomes into 
chromatin brings about a default state of repression, as 
nucleosome assembly constantly competes with transcrip- 
tion factors for promoter binding sites. Overcoming 
this repression requires a concerted action of various 
chromatin-modifying principles. These include ATP- 
dependent nucleosome remodeling factors, which are 
targeted to specific loci by DNA-bound proteins and 
post-translational histone marks where they reorganize 
nucleosomes to facilitate transcription (1). An example 
for such an activity in Drosophila melanogaster is NURF 
(nucleosome remodeling factor), whose large regulatory 
subunit, NURF301, interacts with a diversity of transcrip- 
tion factors and methyl marks on lysine 4 of histone H3 
(H3K4me3) (2,3) (and references therein). NURF has also 
been reported to bind to acetylated lysine 16 of histone H4 
(H4K16ac) (2), a nucleosome modification that prevents 
nucleosome-nucleosome interactions that promote the 
folding of the nucleosomal fiber into more compact struc- 
tures. The acetyltransferase MOF (males absent on the 
first) is a major enzyme responsible for this modification 
in both, Drosophila and mammalian cells (4,5). 

MOF is best known for its key role in the Drosophila 
dosage compensation process. It is a subunit of the dosage 
compensation complex [DCC, also known as male-specific 
lethal (MSL) complex], which brings about the 2-fold 
transcriptional activation of genes on the single male X 
chromosome to equalize expression with the correspond- 
ing genes transcribed from the two female X chromosomes 
(6). The DCC is constituted only in male flies and the 
five protein components, MSL1, MSL2, MSL3, maleless 
(MLE) and MOF, as well as the non-coding roX RNAs 
are essential for male viability. According to the current 
model, the DCC recruits MOF to the transcribed regions 
of X-chromosomal genes. Subsequent acetylation of 
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H4K16 renders chromatin more accessible and potentially 
facilitates transcriptional elongation (7,8). 

With the exception of MSL2, all DCC protein sub- 
units are also expressed in female flies, and therefore 
also serve more general, yet barely understood functions 
(9). For example, the acetyltransferase MOF appears to be 
involved in more global transcription regulation as it has 
recently been found in an alternative complex together 
with MCRS2, the WD40-repeat protein WDS (will-die- 
slowly), NSL1, NSL2, NSL3 and the plant homeo 
domain (PHD) protein MBD-R2 (10-12). With reference 
to the dosage compensation 'MSL complex', this alterna- 
tive MOF-containing assembly was termed 4 NSL 
complex' (for 'non-specific lethal'), as its subunits are 
essential in both sexes (10). The incorporation of MOF 
into either the DCC or the NSL complex is determined 
by association of MOF with the PEHE domains of the 
respective MSL1 or NSL1 subunits (10). Genome-wide 
mapping by chromatin immunoprecipitation (ChIP) 
coupled to DNA microarrays (ChlP-chip) identified 
MOF binding sites at many, but not all active promoters 
in male and female cells (13). Subsequent studies revealed 
that MBD-R2 colocalizes with MOF at many active pro- 
moters in both sexes, suggesting that the NSL complex 
recruits MOF to these sites (12). This is compatible with 
a recent ChlP-Seq study (ChIP DNA analyzed by massive 
parallel sequencing), which found MCRS2 and NSL1 
peaks at promoters in mixed-sex 3rd instar larval 
salivary glands (11). 

In male cells the association of MOF with NSL subunits 
is in competition with its incorporation into the DCC, 
which redirects it to the transcribed regions of X chromo- 
somal genes (12). However, key aspects of MOF's target- 
ing in the context of the NSL complex are unclear. What 
determines the binding of the NSL complex to only a 
subset of the active promoters? The available data also 
are ambiguous when it comes to the role of the NSL 
complex; does it activate or repress target genes, or 
perhaps both? Ablating the NSL subunit MBD-R2 in 
male embryonic cells resulted in a reduced expression of 
many MBD-R2 target genes (12). In contrast, a similar 
fraction of genes was found up- and downregulated 
when MBD-R2 and NSL3 were depleted in 3rd instar 
salivary glands (11). 

In this study, we created novel data sets and analyzed 
existing ones to compare functional interactions of NSL 
subunits in different developmental tissues to better define 
the targets of the NSL complex. We systematically 
explored the common properties of the NSL target 

Table 1. Primer table 



genes, searching for colocalizing chromatin factors and 
prevalent sequence motifs in target promoters. We 
traced the NSL complex through monitoring the NSL1 
subunit and found that it preferentially binds to pro- 
moters of housekeeping genes, which are also approached 
by the chromatin remodeler NURF and the 
methyltransferase Trithorax. There, NSL1 binding correl- 
ates best with the core promoter element DNA 
replication-related element (DRE). However, only a 
defined fraction of NSL 1 -bound genes are actually 
regulated by the complex. Those promoters are depleted 
for insulator proteins and are enriched for the 
E-box-derived promoter motif 'Ohler 5'. Our analysis pro- 
vides a functional classification of housekeeping genes ac- 
cording to their NSL coregulator requirements. 

MATERIALS AND METHODS 

Generation of the NSL1 antibody 

A cDNA fragment corresponding to NSL1 amino acids 
1271-1550 was Polymerase Chain Reaction (PCR) 
amplified from cDNA clone #LP09056 {Drosophila 
Genomics Resource Center; see Table 1) and cloned 
into the pGEX2TKN. The N-terminally glutathion-S- 
transferase (GST)-tagged NSL1 fragment was expressed 
in Escherichia coli BL21, purified on glutathione beads 
and used to raise antibodies in rabbit by a commercial 
supplier. 

RNA interference in S2 cells, immunoblotting and indirect 
immunofluorescence 

Male Drosophila S2 cell cultivation and RNA interference 
(RNAi) were carried out as described before (12). Briefly, 
1.5 x 10 e 6 cells were incubated with 10 |ig dsRNA 
targeted against NSL1 or GST as a control. Primer 
sequences used for dsRNA production are listed in 
Table 1. Cells were harvested after 6 or 7 days and pro- 
cessed for RNA (see below) and protein. For every 10 e 6 
cells, cells were lysed for lOmin in 100 (J.1 of N-buffer 
[ 1 5 mM (4-(2-hydroxyethyl)- 1 -piperazineethanesulfonic 
acid) pH 7.5, 60mM KC1, 15mM NaCl, 0.5mM 
ethylene glycol tetraacetic acid pH 8, 0.25% Triton-X, 
10 mM sodium butyrate, 1 mM phenylmethanesulfo- 
nylfluoride, 0.1 mM Dithiothreitol protease inhibitor 
cocktail (Roche)] on ice and the chromatin fraction was 
pelleted by centrifugation. RNA for Affymetrix expres- 
sion profiling was prepared as described (12). RNA 
labeling and cDNA hybdridization to a Drosophila 



Construct Forward primer sequence Reverse primer sequence 



NSL1 RNAi amplicon 1 TTAATACGACTCACTATAGGGA TTAATACGACTCACTATAGGGA 

GCGTC CGAGCTCAAC CTTC CACATGGGTGTGTTCATTAGTC 

NSL1 RNAi amplicon 2 TTAATACGACTCACTATAGGGA TTAATACGACTCACTATAGGGA 

GATGTCGCATCAAAGTCAGAGG GACTCGAGAAGAGCTCGCTGAT 

GST RNAi amplicon TTAATACGACTCACTATAGGGAG TTAATACGACTCACTATAGGGAGA 

AATGTCCCCTATACTAG GTTA ACGCAT CCAGGCACATTG 

NSL1 antibody cloning CGCTCCATGGCTTTCATT ATTTCTAGATTAGATGC 

AAGTTCCCCTGGAGCACC GTCTGCTGCGAACACCCTC 
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Genome GeneChip 2.0 was performed at the Gene Center 
Affymetrix Microarray Platform (Munich, Germany). 
Immunoblot analysis and immunofluorescence micros- 
copy (IFM) analysis was performed as described previous- 
ly (14). The lamin antibody was obtained from H. 
Saumweber (Berlin) and the MSL1 antibody was 
described previously (15). 

Reporter gene ChIP assay and luciferase reporter assay 

The reporter gene ChIP assay and luciferase reporter 
assay have been described before (12). 

Chromatin extraction and immunoprecipitation 

Chromatin extraction and immunoprecipitation were pre- 
viously described (12). Briefly, chromatin extracts from 
sex-sorted adult flies were prepared and the DNA concen- 
tration of the extract was determined. DNA (7.5-15 ug) 
were used for a single ChIP experiment. Five microliters of 
anti-NSLl serum was used in a single IP reaction. After 
the precipitation and extensive washing, DNA was ex- 
tracted with phenol/chloroform, ethanol precipitated and 
further cleaned using the GenElute PCR clean-up kit 
(SIGMA). DNA was amplified using the whole-genome 
amplification kit (WGA, SIGMA). Labeling, hybridiza- 
tion to customized high-resolution NimbleGen tiling 
arrays (comprising the euchromatic part of the entire X 
chromosome, 5 Mb of 2 L, 2 R and 3 L, respectively, as 
well as 10 Mb of 3 R) (12), scanning and feature extraction 
was performed by imaGenes (Berlin). 

ChlP-chip data processing 

ChlP-chip data analysis was performed using 
R/Bioconductor (www.r-project.org; www.bioconductor 
.org). Raw signals of the NimbleGen NSL1 ChlP-chip 
were normalized and log2-transformed using the 'vsn' 
package (16). IP/input ratios of the modENCODE data 
were scaled to a mean of zero and a standard deviation of 
one. Promoter enrichments were calculated by 
summarizing the probe level signals in a window of 
600 bp centered at the transcriptional start site (TSS) 
(FlyBase release 5.22). Promoter binding was classified 
based on the bimodal distribution of binding values, 
where genes within the population of lower values were 
considered 'unbound' and genes within the population of 
higher values were considered 'bound'. Alternatively, 
'bound' were selected based on the fdr values from the 
iocfdr' package applied on the promoter binding values 
with a fdr cutoff of <0.2. The results are robust to several 
normalization methods and promoter window definitions. 

Genes were classified 'active' when (i) their Affymetrix 
expression value exceeded four (see below) and (ii) RNA 
polymerase II [modENCODE profile (17)] was classified 
as 'bound' on their promoters. A similar result was 
obtained using genes which are 'bound' (modeled on the 
bimodal distribution of the averaged binding along the 
transcribed region) by the elongating polymerase [serine 
2 phosphorylated RNA polymerase II, data from (18)]. 

Promoters were classified as 'peaked', 'broad' and 'weak 
peak' promoters according to Hoskins et al. (19) and Ni 
et al. (20). Hierarchical cluster analysis of the promoter 



binding pattern was carried out using the R package 
'hclust' and the 'complete' or 'ward' clustering approach 
as indicated in the figure legends. 

All available modENCODE chromatin ChlP-chip data 
sets were screened for factors, which are enriched at 
promoter locations (by March 2011). After initial data 
quality assessment probe level binding was assessed for 
promoter probes (broad: ±300 bp centered at TSS; 
narrow: ±100 bp centered at TSS; upstream-biased: 
-300 - TSS- ±100 bp), transcriptional termination (TT) 
sites (broad: ±300 bp centered at TT; narrow: ±100 bp 
centered at TT; downstream-biased: -100 - TT± 300 bp), 
gene probes (probes corresponding to annotated genes 
without promoter and termination probes) and intergenic 
probes (defined as probes not found in previous groups). 
Only ChlP-chip data sets with a clear enrichment for 
promoter probes relative to gene, intergenic and termin- 
ation probes were selected for this study. 

Transcriptome data analysis 

Transcriptome data analysis was conducted as described 
previously (12). Briefly, raw signals were normalized, 
summarized and log2-transformed using the 'gcrma' 
package. Significant change of gene expression was 
calculated applying the 'Iocfdr' package on a 'sam' statis- 
tics using a cutoff of fdr <0.35. Alternatively, genes with 
log2 (NSL1 RNAi GST RNAi)<(-l) were considered 
'down-regulated'. The results are robust to various par- 
ameters in data analysis, as assessed by choosing varying 
thresholds. All expression data set values are 
log2-normalized with a theoretical dynamic range of 
2expl6 (Affymetrix.com). 

Housekeeping gene definition 

Affymetrix expression data sets of 40 different Drosophila 
tissues [GSE7763, (21)] were processed as described above 
for the NSL1 transcriptome data set. For every gene, the 
standard deviation was calculated across all 40 samples 
(gene variation index). Filtering for active genes, the dis- 
tribution of standard deviations resulted in two major 
populations with the best discrimination at a standard 
deviation of ~1.5 (Supplementary Figure S9A and B). 
Consequently, genes with a gene variation index <1.5 
were considered housekeeping genes and genes with a 
gene variation index >1.5 were considered differentially 
regulated genes. The results are robust to different applied 
thresholds. In an alternative analysis (presented in 
Supplementary Figure S2E), we took the more stringent 
call for housekeeping gene function according to the clas- 
sification of Weber and Hurst (22). Here, active genes 
which belong to either the 'tau' class or to the 'breadth' 
class were considered housekeeping genes. 

ChlP-Seq data analysis 

NSL1 ChlP-Seq and corresponding input data sets (11) 
were obtained from the ArrayExpress repository 
(E-MTAB-214). Sequence reads were mapped to the 
Drosophila melanogaster genome (dm3) using bowtie 
(23). Uninformative reads and read anomalies were filtered 
out using the R package 'SPP' (24), resulting in 7840131 
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unique NSL1 ChIP reads and 6094163 unique input reads. 
Peaks were identified using SPP with the following param- 
eters: 'tag-wtd' method, fdr = 0.01, minimal distance 
between detected peaks =100 bp. The input data was 
used to determine statistical significance of NSL1 peaks, 
resulting in the 'peak score'. 

Core promoter motif analysis 

We used the 10 promoter motifs described by Ohler and 
colleagues (25) to analyze promoter motif occurrences. 
For every motif a log-odds weight matrix description 
P of the binding sites is given, which was used to calculate 
a motif score for a specific sequence. It ranges between 
zero and one and measures how similar a binding site is 
to the consensus. In a first step, the log-odds score for the 
consensus site L c is determined by 

L c = J^ti max f p * : b e {A,C,G,T}}, 

where w is the motif length. The motif score given a 
specific binding site starting at position k in sequence X 
is calculated by 

motif score (X,k) = j-Y]™ p <x k ^ 
I — 

The motif score is the ratio of the log-odds score of the 
site at position k to the log-odds score of the consensus 
site. The motif score for the entire sequence X is given by 
the highest motif score in the sequence: 

motif score (X) = max/ f {motif score (X, k)} 

For the analyses, we used a threshold of motif score 
>0.3 to consider a binding site as functional. The 
de novo sequence analysis algorithm will be reported else- 
where (Hartmann and Soeding, manuscript in 
preparation). 

RESULTS 

NSL1 colocalizes with MBD-R2 at many active promoters 

The genomic interaction profile of MOF differs in adult 
male and female flies, reflecting its incorporation into the 
male-specific DCC and the general NSL complex (11,12). 
We previously monitored the MBD-R2 distribution in 
adult male and female flies but could not detect any sig- 
nificant difference (12). Since MBD-R2 is the only NSL 
complex protein which may interact with DCC members 
(10) we sought to compare the genome-wide binding 
pattern of the NSL complex with the potential core 
subunit of the complex, NSL1. In order to compare the 
NSL1 interactions in the genomes of adult male and 
female flies, an antibody was raised against NSL1 and 

Table 2. NSL1 ChlP-Seq peaks mapped to transcript type 



its specificity confirmed combining RNAi with subsequent 
detection by indirect immunofluorescence microscopy 
(IFM) and immunoblotting (Supplementary Figure SI 
and see below). The antibody was then used for 
ChlP-chip experiments, where NSL1 was precipitated 
from chromatin preparations from hand-sorted adult 
male and female flies and the associated DNA was 
amplified and hybridized to high-resolution DNA tiling 
microarrays representing the X chromosome and an 
equivalent amount of the autosomes. The binding profile 
in male and female flies did not show any significant dif- 
ferences (Supplementary Figure S2A). In addition, NSL1 
was found at the same loci as MBD-R2 (Supplementary 
Figure S2B), in agreement with the results of the biochem- 
ical definition of both proteins as 'NSL' complex subunits 
(10-12). The ChlP-chip profiling suggested that NSL1 
globally binds target loci independent of the fly sex, con- 
firming previous ChlP-qPCR analyses at selected loci (11). 

Re-examination of the previously published NSL1 
ChlP-Seq profiles, which had been generated from salivary 
glands of mixed-sex third instar larvae (11), revealed a 
systematic enrichment of NSL 1 peaks at RNA polymerase 
II — promoters relative to genes transcribed by RNA poly- 
merases I and III (Table 2). Applying a superior peak 
calling algorithm (24) to these data identified the majority 
of NSL1 binding events within a window of 200 base pairs 
(bp) around the annotated TSS (Figure 1A), implicating 
the NSL complex in transcriptional initiation. 

In order to avoid the heterogeneous salivary gland tis- 
sue, which impedes a comparison of NSL binding with the 
transcriptional activity and with other known promoter 
binding factors, an NSL1 ChlP-chip profile was generated 
from Drosophila S2 cells. These cells are commonly used in 
the chromatin community because they provide a homo- 
geneous biological material, a fact that allows comparing 
our data to other published genomic data sets, such as the 
comprehensive collection of chromatin factors and histone 
modifications generated by the modENCODE consortium 
with a similar ChlP-chip strategy (17). 

The newly generated NSL1 ChlP-chip profile correlated 
well with our previously published MBD-R2 profile (12) 
as well as with the MBD-R2 profile generated by the 
modENCODE consortium using a different antibody 
(Supplementary Figure S2C). Therefore, in the following 
we subsume the individual NSL1 and MBD-R2 profiles as 
the 'NSL complex' binding, unless stated otherwise. We 
related the NSL complex binding at promoters with the 
transcriptional activity of the corresponding genes, using 
the ChlP-chip profile of the elongating polymerase as a 
direct readout for active transcription (18). The NSL 
complex binds active genes with high preference, but 
only a subset of ~60-70% (depending on the threshold) 
(Figure IB, left). A similar result was obtained when 



Transcript type MiRNA mRNA ncRNA rRNA snoRNA snRNA tRNA 



Number annoted transcripts 194 22765 189 160 249 47 314 

Number NSL1 peaks mapped to transcript TSS 0 4302 14 0 1 1 5 

Fraction NSLl-bound transcripts rel. to all transcripts 0 18.9 7.4 0 0.4 2.1 1.6 
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Figure 1. The MOF-containing NSL complex is enriched at promoters of most housekeeping genes. (A) NSL1 peaks map close to the TSS. The 
histogram displays the distance between the summit of NSL1 ChlP-Seq (11) peak and the closest TSS. Refinement of NSL1 positions relative to the 
TSS compared to (11) was achieved by using an improved peak calling algorithm [SPP (24) and more precisely mapped TSSs (56)]. (B) NSL1 
prevalently binds active gene promoters. Scatter plots of NSL1 promoter binding versus (i) elongating polymerase at all genes (left) (18), (ii) total 
polymerase promoter occupancy at all genes (right) [modENCODE (17)]. Significant binding cutoffs of NSL1 (red) and polymerase (blue) are 
indicated. The density plots on the top of each histogram depict the signal distribution of the elongating polymerase (left) and the total polymerase 
(right). The density plot to the right indicates the NSL1 promoter signal distribution for all genes (black) and active genes [red, based on the 
elongating polymerase II gene signal (18)], respectively. (C) The NSL complex preferentially associates with housekeeping genes (Welch two sample 
(-test, P<2.2e~ 16 ). The boxplot depicts NSL1 binding at differentially regulated and housekeeping genes (for categorization, see 'Materials and 
Methods' section). A similar result is gained using the available MBD-R2 ChlP-chip data sets (12). An alternative, more stringent categorization for 
housekeeping genes after Hurst and colleagues (22) is shown in Supplementary Figure S2. (D) The NSL complex prevalently binds to dispersed 
promoters ('broad with peak' and 'weak peak' promoters) over peaked promoters (Welch two sample ?-test, P <2.2e~ 16 ). Density plot of NSL1 
binding at genes, which were grouped according their transcriptional start site usage in 'peaked' promoters, 'broad with peaked' promoters and 'weak 
peak' promoters using the data of (19). The window is split for NSLl-bound (right) and -unbound (left) promoters. 
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displaying gene activity as a function of polymerase pro- 
moter binding (Figure IB, right) or Affymetrix RNA ex- 
pression profiling (data not shown), in agreement with 
previous studies examining other markers of the NSL 
complex (11,12). 

The NSL complex specifically binds promoters of most 
housekeeping genes 

As noted above, the NSL1 profile is very similar in nuclei 
of different sex and developmental stage despite significant 
expression differences (Supplementary Figure S2A and D). 
This indicates that the NSL complex may associate 
with'housekeeping' genes, which are equally expressed in 
these diverse tissues. To test this hypothesis, we classified 
genes as 'housekeeping' or 'differentially regulated' ac- 
cording to their expression variation index, i.e. the stand- 
ard deviation of expression, when compared between 
several Drosophila tissues (21). According to this classifi- 
cation the NSL complex showed a significant preference 
for 'housekeeping 1 over 'differentially regulated' genes 
(Figure 1C). The same conclusion was reached when 
'housekeeping 1 genes were classified according to the 
more exclusive definition of Hurst and colleagues (22) 
(Supplementary Figure S2E). This conclusion is further 
illustrated by a gene ontology (GO) analysis of bound 
and unbound genes, which revealed that active NSL- 
bound genes are enriched in housekeeping functions such 
as 'cofactor biosynthetic processes', 'microtubule-based 
processes', 'protein complex biogenesis' (Supplementary 
Figure S3), whereas active genes which are not bound by 
the NSL complex are enriched in categories such as 'sen- 
sory perception', 'cell adhesion' and 'tissue developmental 
genes' (Supplementary Figure S4). 

Recent improvements in high-throughput RNA 
profiling techniques facilitated quantitative mapping of 
TSSs at base pair resolution (19,20). Whereas some pro- 
moters possess well-defined TSS, where transcription reli- 
ably initiates within a few base pairs ('focused' or 'peaked' 
promoters), many promoters show a dispersed zone of 



transcription initiation of up to a few hundred base 
pairs, which may be dominated by a major TSS ('broad 
promoters') or not ('weak peak promoter') (20). Notably, 
differentially regulated genes tend to have peaked pro- 
moters whereas housekeeping genes are enriched for 
broad or weak promoters (19). Concordantly, we found 
that the NSL complex is strongly overrepresented at pro- 
moters of the latter classes (Figure ID). 

The NSL complex activates only a specific subset of 
bound genes 

It has remained controversial whether NSL target genes 
are activated or repressed after RNAi ablation of NSL 
complex components (11-13). Akthar and coworkers 
observed that similar fractions of NSL target genes were 
up- or downregulated following RNAi against MOF, 
NSL3 and MBD-R2 and subsequent microarray-based 
transcriptome profiling (11,13). By contrast, we found 
that the transcription of genes that had the NSL subunit 
MBD-R2 bound was mostly reduced when MBD-R2 
levels were lowered (12). However, since MBD-R2 is the 
only NSL complex subunit that was suggested to interact 
with components of the DCC (10), it was necessary to 
exclude indirect effects. We therefore examined the expres- 
sion of NSL target genes after depletion of the core 
subunit of the NSL complex, NSL1. 

RNAi against NSL1 in S2 cells efficiently depleted 
the protein as examined by immunoblotting and IFM 
(Supplementary Figure SI). Genome-wide transcriptome 
profiling of the NSLl-depleted cells led to the down- 
regulation of a considerable fraction of genes (Figure 2A), 
most of which had been classified as 'NSL-bound' before 
(Figure 2B). This is consistent with reporter gene assays 
where the transcription brought about by tethering MOF 
to a model promoter was diminished upon NSL1 deple- 
tion (for details, see Supplementary Figure S5). 
Importantly, the expression of the majority of NSL1 
target genes was unchanged (Figure 2A), such that only 
20-30% of them (depending on the threshold) required 




-6 -4 -2 0 2 4 



log2 (NSL1 RNAi - GST RNAi) 

Figure 2. The NSL complex activates only a subset of its target promoters. (A) The histogram depicts transcriptome changes upon NSL1 depletion 
in S2 cells. The overlayed modeled normal distribution (red) reveals a skew toward the population of down-regulated genes. The gene set was filtered 
for active genes based on the transcriptome of control cells. (B) Proportional Venn diagram depicts genes 2-fold up- or down-regulated after RNAi 
against NSL1, respectively, and NSLl-bound genes (NSL1 ChlP-chip in S2 cells). Numbers in parentheses indicate the size of the respective genes 
sets. Only active genes (total polymerase promoter occupancy determined by the modENCODE consortium) represented on our custom-tailored 
microarray tiling array are shown. 
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NSL1 for proper expression. The same trend had been 
observed earlier in the context with MBD-R2 (12) (and 
data not shown). The MBD-R2 ChlP-chip profile and the 
MBD-R2 RNAi transcriptome data are indeed very 
similar to the NSL1 data (Supplementary Figures S2C 
and S5C), arguing that they form a functional complex 
bound to chromatin. 

We next asked whether the genes that were activated by 
the NSL complex coded for related housekeeping func- 
tions. The GO classification revealed that the genes 
whose expression was diminished upon NSL1 depletion 
were enriched in genes involved in nucleic acid metabol- 
ism, such as genes involved in transcription, RNA pro- 
cessing, translation, DNA replication and DNA repair 
(Supplementary Figure S6). Evidently, the NSL complex 
only activates a specific subset of the many housekeeping 
genes. In order to explore whether the promoters of 
NSL-responsive genes could be recognized by a combin- 
ation of c«-elements and ?ra/w-factors, we set out to iden- 
tify chromatin proteins with genome binding profiles 
related to the NSL complex and to investigate whether 
the promoters regulated by NSL shared particular core 
promoter motifs. 

The NSL complex co-occupies target promoters 
together with the chromatin remodeler NURF 
and the histone methyltransferase Trithorax 

The NSL1 ChlP-chip profile in S2 cells allowed a direct 
comparison with the chromatin profiles recorded by the 
modENCODE consortium (17), which used the same cell 
line and the same profiling technique. We mined the 
modENCODE data for profiles of general chromatin fac- 
tors (excluding sequence-specific transcription factors) 
and histone modifications, which are preferentially en- 
riched at promoters (see 'Materials and Methods' section 
for a detailed discussion on selection algorithm). We 
created a pairwise correlation matrix for 23 selected pro- 
tein and histone modification profiles and performed an 
unsupervised hierarchical clustering to reveal the extent of 
correlation with the NSL complex. We found the profiles 
of the interband protein Chromator, the WD40-repeat 
protein WDS, the NURF complex subunit NURF301 
and the methyltransferase Trithorax highly correlated 
with the NSL complex profile (Figure 3A and B; 
Supplementary Figure S7). Chromator had been found 
in an early NSL complex purification (10) but could not 
been recovered in more recent experiments (11,12), 
possibly due to more transient or indirect interaction. 
Notably, 5-15% of promoters which contain NSL1, 
MBD-R2, WDS, NURF301 and Trithorax lack 
Chromator. The WD40-repeat protein WDS consistently 
copurifies with NSL complex members (10-12) and other 
chromatin complexes including the Drosophila ATAC 
acetyltransferase complex (26) and mammalian MLL 
methyltransferase complexes (27,28). NURF301 is the 
diagnostic marker subunit of the Imitation Switch 
(ISWI)-containing nucleosome remodeling factor 
NURF, which stimulates transcription by remodeling 
promoter nucleosomes (29,30). Trithorax was originally 
described to counteract the repression of homeotic genes 



by the polycomb group proteins (31-33). More recently, 
genome-wide ChlP-chip studies have indicated a wide- 
spread binding of Trithorax to many promoters (34,35). 

The pairwise relationships between the tested factors 
are further illustrated by the scatter plots depicted in 
Figure 3C, which emphasize that the NSL complex, 
WDS, Chromator, NURF301 and Trithorax co-occupy 
target promoters at linearly proportional levels 
(Figure 3C). Promoters which are strongly bound by the 
NSL complex are also highly enriched for NURF301, 
Chromator and Trithorax. The same strong correlation 
can be seen in an unbiased analysis using all microarray 
probe signals, confirming the promoter-focused analysis 
described above (Supplementary Figure S7B). 

Searching for factors enriched at promoters we found 
the heterochromatin protein lc (HPlc) and, consistent 
with previous results (36), the insulator proteins BEAF32 
and CP190 (37) enriched at housekeeping promoters 
(Supplementary Figure S8). These factors localize to 
minor subsets of the NSL/Chromator/NURF301/ 
Trithorax target promoters (Figures 3 and 5; 
Supplementary Figures S7B and 11). Importantly, the 
presence of BEAF32, CP 190 and HPlc determines 
whether the bound NSL complex functions as an activa- 
tor or not (see below). 

Quantitative NSL1 binding correlates best with the 
DNA replication-related element 

Conceivably, the association of the NSL complex and its 
colocalized chromatin modifiers may be determined by a 
particular core promoter architecture. Different pro- 
moters are characterized by the presence and combination 
of a range of sequence motifs that provide contact surfaces 
for general transcription factors and, therefore, modulate 
the formation of the transcription pre-initiation complex 
(38-40). The core promoter sequence motifs can be clas- 
sified as canonical core promoter motifs which have fixed 
positions with regard to the TSS, such as the TATA box, 
the MTE (motif ten element), the DPE (downstream core 
promoter element) and the INR (initiator), or as motifs 
with weaker positional information (Ohler 1, Ohler 5, 
Ohler 6, Ohler 7, Ohler 8 and DRE) (25,41). Canonical 
core promoter motifs are enriched in peaked promoters, 
whereas weakly positioned motifs are characteristic of 
dispersed promoters. The mechanisms of action of most 
dispersed elements are unknown [with the exception of the 
DRE (39)]. 

Since NSL1 peaks within the core promoters of genes 
with dispersed transcriptional start sites (Figure 1A and 
D) we investigated whether the NSL complex is associated 
with a specific set of core promoter motifs. We first 
characterized the core promoter motifs with regard to 
their distribution at active housekeeping and differentially 
regulated genes (Supplementary Figure S9). As the motifs 
deviate from their defined consensus sequences in many 
cases, a similarity score (motif score) was calculated for 
each promoter reflecting the similarity of the sequence to 
any of the ten promoter consensus motifs described by 
Ohler and colleagues (25) (see 'Materials and Methods' 
section). We found that over 70% of all active promoters 
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Figure 3. The NSL complex cooccupies target promoters together with the chromatin remodeler NURF and the histone methyltransferase 
Trithorax. (A) Heat map visualized correlation matrix of promoter-enriched chromatin factors and histone modifications at active genes. Pairwise 
Spearman correlations were calculated using only active autosomal genes. The dendrogram shows the hierarchical clustering after which the matrix 
was sorted. (B) ChlP-chip profile of the indicated proteins along a representative region of the chromosome arm 2 L. The gene structure is indicated 
below (active genes are red). (C) Pairwise scatter plot of promoter binding for each indicated factor using only active autosomal genes. Spearman 
non-parametric correlation coefficients are provided for each pair. 
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can be described by these ten motifs, indicating that our 
analysis is representative (Supplementary Figure S9). In 
agreement with previous analyses (41^3), the promoter 
motifs INR, MTE and DPE were clearly overrepresented 
in differentially regulated genes, which fits their enrich- 
ment at peaked promoters (Supplementary Figures S8 
and S9). Accordingly, housekeeping genes are enriched 
for the motifs DRE, Ohlerl, Ohler 5, Ohler 6 and Ohler 7. 

Being able to characterize the core promoter motifs 
allowed us to examine whether there is a differential asso- 
ciation of NSL with any of them. Active genes were 
categorized either as NSL targets or as non-targets based 
on their NSL complex promoter occupancy. As expected, 
the NSL1 target genes are enriched for the housekeeping 
promoter motifs DRE, Ohler 1, Ohler 5, Ohler 6 and 
Ohler 7 and depleted for TATA, INR, MTE and DPE. 
Consistently, when we performed de novo motif analysis 
of the sequences covered by the NSL1 ChlP-Seq peaks 
(11), we again obtained the same motifs (Supplementary 
Figure S10). This confirms that the NSL1 peaks at core 
promoter motifs are diagnostic for housekeeping genes. 

Is there any correlation between the 'strength' of NSL 1 
binding and how well an underlying motif matches its 
consensus sequence? In order to address this question we 
used the NSL1 ChlP-Seq data set (11), which due to its 
good dynamic range allowed to categorize the ChlP-Seq 
peak score as a surrogate for binding 'strength'. We 
binned the ChlP-Seq peaks in equally sized groups accord- 
ing to their peak score [determined by SPP, (24)] and 
displayed the fraction of promoters bound by a specific 
group at a given motif score (Figure 4A, left). Among the 
ten tested core promoter motifs the DRE motif, and to a 
lesser extent motif Ohler 7, are the only motifs with scores 
that correlate with the NSL1 ChlP-Seq peak score. This 
suggests that DRE-containing promoters (and those con- 
taining the less abundant Ohler 7 motif) primarily contain 
NSL complex targeting clues (Figure 4B). 

The combination of chromatin factors and core promoter 
motifs enhance the prediction of NSL-regulated promoters 

Whether or not a promoter-bound transcription factor 
engages in active regulation often depends on the context 
of close-by cis elements and interacting factors (44). This 
appears to be the case for the NSL complex, as we showed 
that the complex only activates a subset of the promoters 
it associates with. NSL binds with high preference to a set 
of housekeeping promoter motifs and its binding 
'strength' correlates best with the presence of the DRE 
motif. Can the subset of these NSL targets whose tran- 
scription is diminished after depletion of NSL (i.e. those 
promoters at which the complex is functional as an acti- 
vator) be distinguished at the sequence level? We grouped 
active genes according to their core promoter motif class 
(see 'Materials and Methods' section) and monitored the 
transcriptome changes after NSL depletion for each 
group. Strikingly, only promoters containing the core 
promoter motif 'Ohler 5' were strongly enriched for 
NSL complex functional sites (Figure 5A). We note that 
'Ohler 5'-containing promoters do not show the strongest 
correlation to NSL binding strength (Figure 4B) 



suggesting that quantitative differences in factor binding 
are not directly translated into a functional output. 

We had observed that most promoters bound by HPlc, 
BEAF32 and CP 190 are among those also occupied by the 
NSL complex (Figure 3 and Supplementary Figure Sll). 
Most of the HPlc, BEAF32 and CP 190 binding occur 
at distinct subsets as the three factors only colocalize 
at a minority of sites (Supplementary Figure Sll). 
Intriguingly, promoters bound by any of the three factors 
HPlc, BEAF32 or CP190 are obviously underrepresented 
among the genes, whose transcription is activated by the 
NSL complex (Figure 5B and Supplementary Figure SI 1). 

In summary, the data suggest that the functionality of a 
promoter-associated NSL complex is modulated by posi- 
tive effectors (e.g. unidentified interactors of the 'Ohler 5' 
element) and negative regulators (HPlc and the insulator 
proteins BEAF32 and CP 190). 

DISCUSSION 

In this study, we show that the NSL complex is a potential 
coactivator, which binds to many active genes, but regu- 
lates only a specific subset of them. In our efforts to 
describe the circumstances that define complex association 
and function, we considered the contributions of two 
major parameters: the diverse DNA sequences around 
the core promoters, which are characterized by combin- 
ations of recurring sequence motifs, and the association of 
chromatin regulators that have recently been mapped by 
the modENCODE consortium. Combining these diverse 
data sets, we were able to improve the prediction toward 
whether the transcription of an NSL-bound gene is 
modulated by the NSL complex. To our knowledge, this 
is the first systematic study demonstrating the usefulness 
of this type of data integration. 

The NSL complex is a transcription cofactor dedicated to 
housekeeping genes 

Following our observation that the NSL complex binds to 
only a subset of all active promoters, we discovered that 
the target genes were mostly housekeeping genes. This was 
surprising as to our knowledge so far no transcription 
coregulator dedicated to housekeeping genes is known. 
This may simply reflect the fact that historically the mech- 
anisms underlying differential transcription regulation 
received more attention. Several lines of evidence support 
the conclusion that the NSL complex preferentially local- 
izes to the majority of housekeeping promoters, (i) We do 
not detect significant differences in the global chromatin 
binding profile of NSL complex members in cells of dif- 
ferent sex or developmental stage, (ii) Genes that have 
NSL bound at their promoters show little expression vari- 
ation among different tissues as compared to active genes 
that lack the NSL complex, (hi) NSL-bound promoters 
are depleted of sequence motifs known to be enriched in 
genes differentially regulated during development and in 
tissue homeostasis (38). (iv) GO analysis of the active 
NSL-bound genes revealed an overrepresentation of 
categories for housekeeping functions, whereas the 
converse data set of active genes not bound by NSL 
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Figure 4. NSL1 binding correlates best with the DRE motif score. (A) Promoters containing motifs Ohler 1, Ohler 5, DRE and Ohler 7 are 
overrepresented among the NSL complex target genes. Active genes were grouped into NSLl-bound and NSLl-unbound genes, as determined by 
NSL1 ChlP-chip in S2 cells. The curves display the cumulative fraction of promoters, which exceed a given motif score. The motif score is derived 
from the information content of the position weight matrix, as taken from Ohler and colleagues (25), see 'Materials and Methods' section for details). 
(B) NSL1 binding 'strength' linearly scales to the DRE motif score. NSL1 ChlP-Seq peaks (11) are grouped in equal-sized bins according to their 
peak score. Similar to (A), the fraction of promoters is displayed relative to the motif score. 
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Figure 5. A combination of chromatin factors and core promoter motifs enhance the prediction of NSL transcriptional targets. (A) The core 
promoter motif Ohler 5 is overrepresented in genes, which are regulated by the NSL complex (/><2.2e~ 16 , one-sample r-test). The density plot 
of the transcriptome changes after RNAi against NSL1 for groups of genes enriched for the ten core promoter motifs defined by Ohler et al. (25). In 
the presented analysis only active NSL-bound genes were considered (using the modENCODE MBD-R2 data set (17), which allows the comparison 
to a larger transcriptome data set). The same conclusion was reached if all active genes (without filtering for NSL complex binding) or different 'NSL 
complex binding' criteria (such as NSL1 binding) were employed (data not shown). (B) Venn diagram representation of gene sets bound by the NSL 
complex (modENCODE MBD-R2 ChlP-chip profile), down-regulated after ablation of the NSL complex and bound by at least one of the three 
investigated proteins BEAF32, CP190 and HPlc (17). Numbers in brackets represent the number of genes in the respective group. The 
modENCODE MBD-R2 data set served as a surrogate for the NSL complex to minimize technical variation in the comparison with the 
BEAF32/CP190/HPlc data sets. Similar results were obtained using the NSL1 data set and the MBD-R2 (12) data set, respectively. 



present diverse categories including 'developmental pro- 
grams' and 'acute signaling'. Other chromatin constitu- 
ents, like HPlc and the insulator proteins BEAF32 and 
CP 190 also interact preferentially with housekeeping gene 
promoters, as previously shown by Ohler and colleagues 
(36), but these factors bind to a much more limited 
number of genes in this class. Our analysis supports the 
concept of global coregulation of functionally related gene 
classes by common cofactors. 

The developmental regulators NURF and Trithorax 
colocalize with NSL at housekeeping promoters 

The extensive colocalization of the NSL complex with the 
methyltransferase Trithorax and the chromatin remodeler 
NURF is puzzling since those factors are best known as 
regulators of transcription of very restricted sets of genes 
(developmental and highly inducible genes) (30,32), and 
only recently has their extensive genome-wide localization 
at many active gene promoters been noticed (34,35,45,46). 
Conceivably, these three complexes cooperate to regulate 
the transcription of housekeeping genes at the level of 
chromatin organization and/or transcription initiation. 
This hypothesis is supported by previous reports of bio- 
chemical or genetic interactions between components 
of the three factors. A genetic interaction between the 
Xenopus BPTF (the NURF301 homolog) and Xenopus 
WDR5 (a homolog of the NSL subunit WDS) has been 
reported (47). Furthermore, Dou et al. (27) described a 
'supercomplex' containing the human NSL as well as the 
MLL1 complexes [MLL1 is homologous to Drosophila 
Trithorax]. 



At present it is not clear whether NURF and Trithorax- 
containing complexes contribute to the targeting of the 
NSL complex (or vice versa), or whether all three regula- 
tors are attracted by an additional common denominator 
of target promoters. None of the three complexes contains 
any specific DNA-binding subunit. NURF can be re- 
cruited to inducible genes via direct interactions between 
the large NURF301 subunit and transcription factors, 
such as the GAGA factor (29) or the ecdysone receptor 
(48). However, these interactions certainly do not explain 
the widespread targeting of NURF to housekeeping genes 
in vivo reported here. We noted a good quantitative cor- 
relation between the NSL1 binding levels and the DRE 
core promoter motif score, which opens the possibility that 
a DRE — recognizing factor may stimulate NSL recruit- 
ment. One candidate for such a factor is DREF, which 
has been isolated as a DRE binding factor (49). DREF 
may also contribute to the recruitment of NURF, since an 
association of DREF with NURF has been observed in a 
much larger complex based on the immunoaffinity purifi- 
cation of the TATA box binding protein (TBP)-related 
factor TRF2 (39). 

In addition to direct recruitment by DNA-binding 
proteins, transcription cofactors may be tethered by spe- 
cific local histone modifications through recognition 
domains (50). It is likely that this principle will also con- 
tribute to the observed colocalization of NSL, NURF and 
Trithorax complexes. Trithorax (the Drosophila MLL1 
homolog) is an enzyme that methylates histone H3 at 
lysine 4 (H3K4me3), a mark that characterizes active pro- 
moters (46). Interestingly, WDS, which copurifies with 
NSL complexes from Drosophila and mammalian cells 
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(27) has been shown to preferentially interact with 
methylated H3K4 (28). The mammalian homolog of 
NURF301 (BPTF) also recognizes mononucleosomes 
marked with methylated H3K4 and acetylated H4K16ac 
through its PHD finger and bromodomain, respectively 
(51). Acetylation of H4K16 by MOF in the NSL 
complex may, therefore, contribute to the local enrich- 
ment of NURF at target promoters. Our study gives rise 
to numerous testable hypotheses as to the nature of the 
interaction network that leads to the observed selective 
targeting of the NSL complex. 

The NSL complex only regulates a subset of target 
promoters 

The detailed analysis of the transcriptional effects of the 
NSL complex revealed that the NSL complex regulates 
only a subset of bound genes. Such a situation is not 
without precedent as it has been shown for a number 
of transcription factors that many binding events appear 
to be non-functional (44). In fact, it is a major challenge to 
predict the functional sites from the interaction profiles of 
single factors as functionality is frequently determined by 
the local clustering of binding sites, synergism between 
colocalized proteins, and recently, chromatin accessibility 
(52,53). Accordingly, we favor the idea that a combination 
of chromatin factors and core promoter elements deter- 
mines the activity of the NSL complex at any target 
promoter. An even more immediate influence of 
promoter DNA on interacting proteins may be imagined 
as a direct effect of a DNA sequence on the conformation 
and, therefore, the activity of a bound transcription factor 
has been described (54). 

Alternatively, it is possible that the default state of every 
chromatin-bound NSL complex is functional, but that the 
realization of this potential is restricted by negative 
factors. We found that the presence of either one of the 
three proteins HPlc, BEAF32 or CP 190 correlated with 
lack of NSL1 regulation. Insulator binding proteins like 
BEAF32 and CP 190 are known to decrease enhancer- 
promoter interactions, which may lead to decreased tran- 
scriptional output. Interestingly, antagonistic roles for 
BEAF32 and DREF have been suggested for some over- 
lapping in vivo binding sites (55). Resolving the mechan- 
istic intricacies of complex promoter regulation remains a 
challenging task for future endeavors. 
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